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Abstract 

First we define a unification grammar formalism called the Tree Homomorphic Feature 
Structure Grammar. It is based on Lexical Functional Grammar (LFG), but has a strong 
restriction on the syntax of the equations. We then show that this grammar formalism 
defines a full abstract family of languages, and that it is capable of describing cross-serial 
dependencies of the type found in Swiss German. 

1 Introduction 

Due to their combination of simplicity and flexibility unification grammars have become 
widely used in computational linguistics in the last fifteen yeas. But this flexibility results in 
a very powerful formalism. As a result of this power, the membership problem for unification 
grammars in their most general form is undecidable. Therefore most such grammars have 
restrictions to make them decidable, e.g. the off-line parsability constraint in LFG [KB82]. 
Even so, the membership problem is NP-complete [BBR87] or harder for most unification 
grammar formalisms. It is therefore interesting to study further restrictions on unification 
grammars. Most such studies have been concerned with making the formalism decidable 
[Joh91, Joh94]. But there has also been work on formalisms for which the membership 
problem can be decided in polynomial time. GPSG [GKPS85] which was one of the first 
unification grammar formalisms has only a finite number of possible feature structures and 
describes the class of context-free languages. Then it follows that we can decide in polynomial 
time if a given string is a member of the language generated by a GPSG-grammar. In their 
work, Keller and Weir [KW95] define a grammar formalism with feature structures for which 
the membership problem can be decided in polynomial time. Here there is no common feature 
structure for the sentence as a whole, only feature structures annotated to the nodes in a 
phrase structure tree, with only limited possibilities to share information. In this paper we 
will study a formalism that lies somewhere in between the most powerful formalisms and 
the most limited ones. The grammar formalism that we define is based on work by Colban 
[Col91]. 

What we here call unification grammars are also called attribute- value grammars, feature- 
structure grammars and constraint-based grammars. We may divide them in two major 
groups, those based on a phrase-structure backbone such as LFG and PATR, and those 
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entirely described using feature structures such as HPSG [PS94]. We will here use a context- 
free phrase structure backbone and add equations to the nodes in the phrase-structure tree 
as in LFG. These equations will describe feature structures. Due to a restriction that we 
will impose on the equations in the grammar, the feature structures will be trees that are 
homomorphic with the phrase structure tree. This homomorphism is interesting from a 
computational point of view. 

2 Feature structures 

One of the main characteristics of unification grammars is that they are information based. 
This information is inductively collected from the sentences sub-strings, sub-sub-strings and 
so on. We will use feature structures to represent this information. There are many ways of 
viewing, defining and describing feature structures, e.g. as directed acyclic graphs [Shi86], as 
finite deterministic automata [KR90], as models for first order logic [Smo88, Smo92, Joh88], 
or as Kripke frames for modal logic [Bla94]. Here we use a slightly modified version of 
Kasper and Rounds [KR90] definition of feature structures, and we will later use a subset 
of the equations schemata used by LFG to describe these feature structures. As a basis we 
assume two predefined sets, one of attribute symbols and one of value symbols. In a linguistic 
framework these sets will typically include things like subject, object, number, case etc. as 
attribute symbols, and singular, plural, dative, accusative etc. as value symbols. 

Definition 1 A feature structure M over the set of attribute symbols A and value symbols 
V is a J r tuple (Q, fz>, So, a) where 

• Q is a finite set of nodes, 

• Id '■ D —7- Q is a function, called the name mapping function, where D is a finite set 
of names, 

• So : Q X A —7- Q is a partial function, called the transition function, 

• a : Q — > V is a partial function called the atomic value function. 

We extend the transition function So to be a partial function S : (Q X A*) — > Q as follows: 
1) For every q G Q, S(q,e) = q, 2) if S(qi,w) = qi and So(q2,a) = ^3 then S(qi,wa) = ^3 for 
every qi, qi, q% G Q, w G A* and a £ A. 

A feature structure is well defined if it is 

• atomic: For all q G Q, if a(q) is defined, then So(q, a) is not defined for any a £ A. 

• acyclic: For all q G Q, S(q,w) = q if and only if w = e. 

• describable: For all q G Q there exists an x G D and a w G A* such that S(f£>(x), w) = q 
All feature structures are required to be well defined. 

We may also view this as a directed acyclic graph where all the edges are labeled with 
attribute symbols and some nodes without out-edges have assigned value symbols. In addition 
we name some nodes, such that each node may have more than one name. We will draw 
feature structures as graphs; an example is shown in Figure 2. 

Some definitions of feature structures require that they have an initial node from which 
one can reach every other node with the extended transition function. We prefer to use the 
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name mapping function and to require feature structures to be describable. If we instead of 
the name mapping function add an initial node qo, and replace the name mapping function by 
a definition of S(qo, x) = fD( x ) for all x such that /d(x) is defined, we get a feature structure 
with an initial node as specified from a describable one with names. We will in the rest of 
the paper view the domain of names as implicitly defined in the name mapping function / 
and drop D as subscript. 

We use equations to describe feature structures, such that a set of equations describes the 
least feature structure that satisfies all the equations in the set: A feature structure satisfies 
the equation 

XiWi = x 2 w 2 (1) 
if and only if S(f(xi),wi) = 5(/(a; 2 ), -u? 2 ), and the equation 

X3W3 = v (2) 

if and only if a(S(f(x3) , W3)) = v, where x\, X2, X3 G D, w\, -u? 2 , w 3 G A* and v G V. We only 
allow equations on those two forms, X\W\ = X2W2 and X3W3 = v. In the grammar formalism 
we will even limit this a bit more. 

If E is a set of equations and M is a well defined feature structure such that M satisfies 
every equation in E then we say that M supports E and we write 

M \= E (3) 

If M\ and M2 are feature structures, we say that M\ subsumes M 2 , written M\ C M 2 , if 
and only if for every set of equations E, if M\ \= E then M2 \= E. If M\ subsumes M2 then 
M2 contains all the information that M\ contains. We see that if M\ \= E so must M2 \= E 
for all M2 such that M\ C M 2 . Two feature structures M\ and M 2 , are equivalent if and 
only if M\ C M2 and M2 Q M\. This means that they contain the same information. Then 
subsumption will give us a partial order of the equivalent classes of feature structures. Given 
a set of equations E, we say that E describes a feature structure M if and only if M \= E 
and for every feature structure M' , if M' \= E then M C M' . A given set of equations 
describes different, but equivalent feature structures. If a equation set is supported by an 
feature structure, then there exists a feature structure which the equation set describes. If E 
describes a feature structure M we write this 

E > M (4) 

Here we describe feature structures without using unification. In our very simple way 
of defining and describing feature structures this is only a matter of taste and unification 
is just another approach to the same kind of information collecting. To see this, we define 
unification in the usual way: Let M\ and M2 be two well defined feature structures. Then 
the unification of M x and M 2 , (Mi U M 2 ) is a feature structure such that M x C (Mi U M 2 ), 
M 2 C (Mi U M 2 ) and for every M' such that Mi C M' and M 2 C M', (Mi U M 2 ) C M' . 
From the definition we then get that if E 1 > Mi and E 2 > M 2 then (£1 U E 2 ) > Mi U M 2 . 
Instead of using unification directly we collect equations and see if all the equations together 
describe a feature structure. 

A set of equations E is consistent if there exists a well defined feature structure that 
E describes. It is possible that an equation set does not describes any well defined feature 
structure. We then say that the equation set is inconsistent. This happens for instance if the 
equation set contains both the equations ea = v and eaa = v for a value symbol v. A feature 
structure that satisfies those two equations cannot be atomic. 
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3 The grammar formalism 



The Tree Homomorphic Feature Structure Grammar 1 (thfsg) is based on LFG [KB82], but 
is much simplified. The main difference is that we have a strong restriction on the sets of 
equation schemata, we treat the lexical items in almost the same way as production rules, and 
we do not have the completeness and coherence constraints or anything like the functional 
uncertainty mechanism [KMZ87]. We have instead tried to make the formalism as simple as 
possible. This grammar formalism is very much like the grammar formalism GF1 defined 
by Colban [Col91] but there is one main difference; we accept empty right hand sides in the 
lexicon rules. This gives us the ability to describe a full abstract family of languages which 
GF1 does not [Bur92]. In LFG without functional uncertainty empty right hand sides are 
used in the analysis of long-distance dependencies. In addition GF1 only accepts equation 
schemata on the format that THFSG-grammars have in their normal form. 

Definition 2 A Tree Homomorphic Feature Structure Grammar ('thfsg ) is a 5-tuple (/C, S, S, V, £) 
over the set of attribute symbols A and value symbols V where 

• K. is a finite set of symbols, called categories, 

• S £ tC is a symbol, called start symbol, 

• S is a finite set of symbols, called terminals, 

• V is a finite set of production rules 

K Ki ... K m (5) 

Ei E m 

where m > 1, Ko, K m £ tC, and for all i, 1 < i < m, Ei is a finite set consisting 
of one and only one equation schema on the form ^ a\...a n =\. where n > and 
ai,...,a n £ A, and a finite number of equation schemata on the form ^ a\...a n = v 
where n > 1, a\, a n £ A and v £ V . 

• £ is a finite set of lexicon rules 

K t (6) 

E 

where K £ t £ (EU {e}), and E is a finite set of equation schemata on the form 
^ a\...a n = v where n > 1, a\, a n £ A and v £ V . 

The sets K, and S are required to be disjoint. 

As in LFG, we see that to each element on the right hand side in production and lexicon 
rules we annotate a set of equation schemata. These equation schemata differ form the 
equations used to describe feature structures: the schemata have up and down arrows where 
equations have names. The up and down arrows are metavariables: to get equations we 
instantiate the arrows to the nodes in the phrase structure tree. In the production rules each 

1 This grammar formalisms is part of a hierarchy of grammar formalisms based on different equation formats 
and definitions of grammatical strings described in [Bur92]. What we here call THFSG is there named RSiSzTq. 
Among the other formalisms is RSo &To , which has an undecidable membership question, and RSi &T2 for 
which we can decide membership in time 0(n ?y ) and which in fact describes the class of context-free languages. 
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set of equation schemata includes one and only one schema with both up and down arrows. 
In this schema we only allow attribute symbols on the left hand side, -none on the right hand 
side. As a result of this we will later see that the described feature structure will be a tree 
that is homomorphic with the phrase structure tree or constituent structure as we will call 
it. But first we must define constituent structures and the set of grammatical strings with 
respect to a grammar. 

To define the constituent structures we use tree domains: Let A/+ be the set of all integers 
greater than zero. A tree domain D is a set D C Af^_ of number strings so that if x £ D then 
all prefixes of x are also in D, and for all i £ A/+ and x £ Af+, if xi £ D then xj £ D for all 
j, 1 < j < i. The out degree d(x) of an element a; in a tree domain D is the cardinality of 
the set {i \ xi £ D, i £ A/+}. The set of terminals of D is term(D) = {x \ x £ D, d(x) = 0}. 
The elements of a tree domain are totally ordered lexicographically as follows: x' -< x if x' is 
a prefix of x, or there exist strings y, z, z' £ Af+ and i,j £ A/+ with i < j, such that x' = yiz' 
and x = yjz. 

A tree domain _D can be viewed as a tree graph in the following way: The elements of D 
are the nodes, e is the root, and for every x £ D the element xi £ D is s's child number i. 
The terminals of D are then the terminal nodes in the tree. 

A tree domain describes the topology of a phrase structure tree. This representation 
provides a name for every node in the tree, directly from the definition of a tree domain. We 
will substitute the arrows used in the equation schemata with these names. A tree domain 
may be infinite, but we restrict the attention to finite tree domains. 2 

Definition 3 A constituent structure (c- structure) based on a TBFSG-grammar G = (/C,S,T,,V,C) 
is a triple (D, K, E) where 

• D is a finite tree domain, 

• K : D — > (/C U £ U {s}) is a function, 

• E : (D — {e}) — > T is a function where T is the set of all sets of equation schemata in 



such that K(x) £ (S U {e}) for all x £ term(D), K{e) = S, and for all x £ (D — term(D)) , 
if d(x) = to then 



is a production or lexicon rule in G. 

The terminal string of a constituent structure is the string K {x\) ...K (x n ) such that 
{x\, x n } = term(D) and X{ -< Xi + i for all i, 1 < i < n. 

Here the function K labels the nonterminal nodes with category symbols and the terminal 
nodes with terminal symbols. The terminal string is then a string in S* since K(x) £ (SU{e}) 
for all x £ term(D). The function E assigns a set of equation schemata to each node in 
the tree domain. This is done such that each mother-node together with all its daughters 
corresponds to a production or lexicon rule. To get equations that can be used to describe 
feature structures we must instantiate the up and down arrows in the equation schemata 
from the production and lexicon rules. We substitute them with nodes from the c-structure. 
For this purpose we define the '-function such that E'(xi) = E(xi)[x/ ^, xi/ \]. We see that 
the value of the function E' is a set of equations that feature structures may support. 

2 See Gallier [Gal86] for more about tree domains. 



G 



K(x) -> K(xl) 
E{xl) 



K(xm) 
E(xm) 



(7) 
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Definition 4 The c- structure (D, K, E) generates the feature structure M if and only if 

(J E\x) > M (8) 

xe(D-{e}) 

A c-structure is consistent if it generates a feature structure. 

The nonterminal part of the tree domain will form the name set for feature structures 
that this union describes. A c-structure is consistent if this union is consistent and a string 
is grammatical if its c-structure is consistent. 

Definition 5 Let G be a TBFSG-grammar. A string w is grammatical with respect to G if 
and only if there exists a consistent c-structure with w as the terminal string. 

The set of all grammatical strings 3 with respect to a grammar G is denoted L(G) and 
is the language that the grammar G generates. Two grammars G and G' are equivalent if 
L(G) = L(G'). 

Example 1 Assume that next and lex are attribute symbols in A, and a, b, c and $ are value 
symbols in V. Let G\ be the THFSG-grammar (/C, S, S, V, C) where K. = {S, B, B', C, C', C"}, 
S = {a, b, c} and V contains the following production rules 

S B C C B (9) 

t=i t=i t=i t=i 



c c c 

^ next =1 ^ next =], 



C 



C" 
^ next = 



(10) 



B B' B B -+ B' (11) 

t=^ t next =1 t=^ 



^ next = $ 



Moreover contains £ the following lexicon rules 



B' a B' b 

^ lex = a ^ lex = b 

(12) 

C' c 



3 We may have different definitions of which strings are grammatical. For grammars in normal form (see 
below) we may also require for a c-structure to (correctly) generate a feature structure that for any two nodes 
x and y in the c-structure , if S( f(x), w) = f(y) then f(x) = f(x') where x' is the greatest common prefix of 
x and y, or in other words, x' is the closest common predecessor. If we add this constraint we get a grammar 
formalism that describes the class of context-free languages [Bur92, Col91]. 
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Twex>=4- 'tnext=-l 'tnext=-l 'tnexl=-l Twex>=4- 



TwexJ=S 



'tlex^a 



'tnex^S 



c 




Tne^f=$ 



c 




T«ex>=$ 



c 




TwexJ=S 



c 




B' 



B 

TwexJ=J- 



Twex>=$ 



tlex=a 



Figure 1: c- structure for the string "baccccba" in L(G\ 



Figure 1 shows the c-structure for the string baccccba. The following are the equations we 
*et from the left subtree after we have instantiated the up and down arrows: 



e 
1 

11 lex 



1 

11 
b 



1 next 
12 

12 next 
121 lex 



12 
121 

$ 
a 



(13) 



These are only a subset of all the equations from the c-structure. Figure 2 shows a feature 
structure which the c-structure generates. This shows that baccccba is grammatical with 
respect to G\. The language generated by G\ is 



L(Gi) = {wc 2 "w | w £ {a, b}* A \w\ = n A n > 1} 



(14) 



Here we use the attribute next to count the length of the w substring and the attribute lex 
to distribute information about its content. 



In this grammar formalism we allow one and only one equation schema with both up 
and down arrows in each set of equation schemata in the production rules. Moreover in this 
schema we only allow attribute symbols on the left hand side — none on the right hand 
side. As a result the feature structures will be trees and the domination relation in the c- 
structure is preserved in the feature structure [Col91]. The domination relation must not be 
confused with the lexicographical ordering of the nodes in the c-structure, so let us define 
the domination relation on the c-structure and the feature structure: For all the nodes x in 
the tree domain D of a c-structure, let x' < c x for all prefixes x' of x. This is the traditional 
predecessor relation on tree graphs. In the feature structure, let q' <m q for all nodes such 
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lex 



*■ b 



next 



lex 



a 



next 



$ 



Figure 2: Feature- structure for the string "baccccba" in L(G\). We have omitted the names 
here. 



that S(q' , w) = q for a w £ A* . Then a node in a feature structure dominates another node 
if there exists an attribute path from the first node to the second node. For any c-structure 
(D, K, E) which generates a feature structure M we then have 



Then the name function / : D — > Q is a homomorphism between the node sets with the 
domination relation of those two structures [Col91]. 

We close this presentation of THFSG-grammars by defining a normal form: 

Definition 6 A TBFSG-grammar G = (/C, S, E, V, £) is in normal form if each production 
rule in V is on the form 



where K\, K2, K3 £ K, and each of the equation schema sets, E\ and E%, is a finite set 
consisting one and only one equation schema on the form ^ a =J, or where a £ A, and 
a finite number of equation schemata on the form ^ a\...a n = v where n > 1, a\, a n £ A 
and v £ V . 

We see that a THFSG-grammar is in normal form if every production rule has exactly two 
elements on the right hand side and the equation schemata with both up and down arrows 
have no more than one attribute symbol. 

Lemma 1 For every TBFSG-grammar there exist an equivalent TBFSG-grammar in normal 
form. 

Proof: We show how to construct a THFSG-grammar in normal form G' for any THFSG- 
grammar G such that L(G) = L(G'). There are two constraints for grammars in normal 
form, one on the equation schemata, and one on the format of the production rules. First we 
show how to get the equation schemata right. 

For each set of equation schemata E{ with an equation schema \ a\...a n =\ where n > 1 
in each production rule 



x' < c X => f(x') < M f(x) 



(15) 



K Ki K2 

Ei E 2 



(16) 



K Ki 



Ki 



E, 



m 



(17) 
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we replace K{ with a unique new category K- 1 and E{ with the set E\ = (Ei — {^ a\...a n =J, 
}) U {t «i =^}, and add the new production rules: 

t aj =4- 

for all j, 2 < j < n where K[ 2 ---K[ n -\ are unique new categories and K[ n = K{. Now each 
set of equation schemata in each production rule is as required for the normal form. 
Next, to get the production rules right: For each production rule 

K Ki - K m (19) 

Ei E m 

with m > 2, we replace this production rule with the two production rules 

K Kx K' 2 (20) 



K m -! -> K m -i K m (21) 

-&m — 1 E m 

together with the new production rules 

K[ -+ Ki K[ +l (22) 

for all i, 2 < i < (m — 2) where K' 2 , ...,K' m _ 1 are unique new categories. If m = 1 in the 
production rule (19) we replace the rule with the new production rule 

K K x e (23) 

Ei t=; 



and add the lexicon rule 



(24) 



where e is a new category. 

Now we have a grammar in normal form, and it is easy to see that we have a consistent 
c-structure for a string based on the original grammar if and only if we have a consistent 
c-structure for the same string based on the grammar in normal form. Then L(G) = L(G'). 
□ 



4 Full abstract family of languages 

When studying formal grammars we often want to study the class of languages that a gram- 
mar formalism defines. A class of languages, Cr over a countable set T of symbols is a set 
of languages, such that for each language L £ Cr there exist a finite subset S of T such that 
LCE*. The class Cr(GF) of languages that a grammar formalism GF defines is the set of 
all languages V over T such that there exists a grammar G in GF such that L(G) = V . 
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For a given countably infinite T an uncountable number of different classes of languages 
exist. Some of them are more natural and well-behaved than others, and of particular interest 
are the full abstract families of languages (full AFL) . A full AFL is a class of languages closed 
under union, concatenation, Kleene closure, intersection with regular languages, string homo- 
morphism and inverse string homomorphism 4 . The class of regular languages and context-free 
languages are both full AFL, but the class of context-sensitive languages is not since they 
are not closed under homomorphism[HU79]. Here we show that the class of languages that 
the grammar formalism THFSG defines 5 , C(thfsg) is a full abstract family of languages. But 
first we need a precise definition of full abstract families of languages. 

A string homomorphism is a function h : A* — > S* such that for every w £ A* and a £ A 
we have 

h(e) = £ (25) 
h(aw) = h(a)h(w) (26) 

A string homomorphic image of a language L C A* for a string homomorphism h : A* — > S* 
is the language {h(w) \ w £ L}. The inverse string homomorphic image of a language 
V C S* is the language {w \ h(w) £ L'}. The concatenation of two languages L\ and L2 is 
the language {w 1 w 2 | w\ £ L\ Ai«2 £ Li\- The Kleene closure of a language L is the language 
{w\ . . . w n I n > A w\, . . . , w n £ L}. Union and intersection are the standard set-theoretic 
operations. 

Lemma 2 C(thfsg) is closed under union, concatenation and Kleene- closure. 

Proof: Let two THFSG-grammars G = (/C, S, S, V, C) and G' = (/C', S', X/, V, £') be given 
and assume that (/C fl /C') = 0, So (j£ (/C U /C' U S U £'), and that first and next are not used 
as attribute symbols in G or G' . 

Union: Let G u be the grammar (K, U K,' U {So}, S 0l £ U V" , L U £') where V" is the 
least set such that (V U V) C V" and V" contains the following two production rules: 

So ^ S (27) 



So -> S' (28) 

t=; 

Then Gu is a THFSG-grammar and it is trivial that L(Gu) = L(G) U L(G'). 

Concatenation: Let G con be the grammar (/C U /C' U {<$o}, ^Oj £ U S', 7-"', £ U £') where 7-"' 
is the least set such that {V U V 1 ) C T 7 " and T 7 " contains the following production rule: 

So ^ S S' (29) 

Then G con is a THFSG-grammar and it is trivial that L(G con ) = L(G)L(G') . 

Kleene- closure: Let G* be the grammar (/C U {i5o}, 5o, S, P", £") where V" is the least 
set such that V C 7-*" and T 7 " contains the following production rule: 

So ^ S So (30) 



4 See Ginsburg [Gin75] for more about full abstract families of languages. 



3 We assume here that T is the set of all symbols that we use and drop T as subscript in C(thfsg) 
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Moreover is L" the least set such that L C £" and L" contains the following lexicon rule: 

So e (31) 



Then G* is a THFSG-grammar and it is trivial that L(G*) = L(G)*. □ 
To show that C(thfsg) is closed under intersection with regular languages, string homo- 
morphism and inverse string homomorphism we show that C(thfsg) is closed under NFT- 
mapping. Informally, a Nondeterministic Finite Transducer (NFT) is a nondeterministic 
finite state machine with an additional write tape. In addition to just reading symbols 
and changing states, an NFT also writes symbols on the write tape. It may write sym- 
bols and change states when reading the empty string. Formally, an NFT is a 6-tuple 
M = (Q, A, S, So, qo, F) where Q is a finite set of states, A is an input-alphabet, S is an 
output-alphabet, So is a function from Q X (A U {e}) to finite subsets of Q X £*, go & Q is 
the initial state and F C Q is a set of final states. 

For every <?i , #2 > <73 6 G (A U {e}),w G A* and x,y G £*, the extended transition 
function S from Q X A* to subsets of Q X S* is defined as the least function satisfying the 
following 

(qi,e)e5( qi ,e) (32) 
(g 2 , x) G S(q 1 ,w) A (g 3 , y) G £ (g 2 , a ) =^ fe, ^2/) G %i, wa) (33) 

For any NFT M = (Q, A, S, S , q , F), the NFT-mapping M of a string w G A* and a 
language L C A* is defined as follows: 

M(w) = {x \ 3q £ F : (q, x) G S(q , w)} (34) 
M(L) = J M(w) (35) 

Further is the inverse NFT-mapping M~ l of a string ifS* and a language L' C S* defined 
as follows: 

M _1 (a;) = {w\xeM(w)} (36) 
M _1 (L') = J M _1 (-U7) (37) 

The definition of NFT is sufficiently general that for any given NFT-mapping, the inverse 
NFT-mapping is also an NFT-mapping. A finite state machine is a special version of an 
NFT, which writes every symbol it reads, and does not change state or write anything while 
reading the empty string. If M is a finite state machine version of an NFT, then M(L) is the 
intersection of L and the regular language that the finite state machine describes. 

A string homomorphism h : A* — > S* can be expressed by an NFT. Let Mh be the NFT 
(Q, A, S, So, qo, F) such that Q = F = {go} an d for all a G A S(qo,a) = {(qo,h(a)}. Then 
h(L) = Mh(L) for any language L C A* and the inverse string homomorphism can also be 
expressed with an NFT-mapping. 

By showing that the class C(thfsg) is closed under NFT-mapping it follows that it is 
closed under intersection with regular languages, string homomorphism and inverse string 
homomorphism. We do this by first defining a grammar from a THFSG-grammar in normal 
form and an NFT, and then show that this grammar generates the NFT-mapping of the 
language generated by the first grammar. 
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Definition 7 Given a TBFSG-grammar G = (/C, S, A, V , £) in normal form and a Nondeter- 
ministic Finite Transducer M = (Q, A, S, So, qo, F). Assume that the symbols So and a for 
all a G (SU{e}) are not used in G. The grammar Gm = So, S, V' , £') for the NFT-image 
M(F(G)) is defined as follows: 

Fet K,' be the set (Q X (K, U A U {e}) X Q) U {a \ a G (S U {e})} U {<S } and let V and C be 
the least sets such that: 

a) For all q G F, the following is a rule in V' : 



b) For all production rules 



So -> {qo,S,q) (38) 



Ko I<! K 2 (39) 

Ei E 2 



in V and all q\, q 2 , q 3 G Q, the following is a rule in V' 



c) For all lexicon rules 



(gi,K ,g 3 ) -> (q 1 ,K 1 ,q 2 ) (^2,-^2,^3) (40) 
Ei E 2 



K 6 (41) 

F 

m L and all q\,q 2 G Q, the following is a rule in V' : 

(qi,K,q 2 ) (qi,b,q 2 ) (42) 

Fu{t=;} 

<f) For all qi,q 2 ,q3 G Q a ^ & G (A U {e}), the following are rules in P' 

{qi,b,q 3 ) -> (qi,b,q 2 ) (q 2 ,e,q 3 ) (43) 



{qi,b,q 3 ) -> (gi,e,g 2 ) (g2,Ms) (44) 



e) For a// gi,g2 G G (A U {e}) anc? y G £*, s«c/i that (q 2 ,y) G #o(gi,6) where 

y = ai...a n /or |y| = n > 1, or if y = e let d\ = e and n = 1, the following is a 
production rule in V' : 

(q x ,b,q 2 ) -> ai ••• a n (45) 
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Figure 3: Transformation to the NFT-mappings grammar for (p,y) G S(q,x). 



f) For all a £ (S U {e}), the following is a rule in £' : 

a -> a (46) 


The main idea in this definition is that if a node in a c-structure based on G with category 
K is the root of a sub-c-structure with x as terminal string and the NFT accepts x as input 
string in a state q, then there is a corresponding node in a c-structure based on Gm with 
category (q,K,p). This node is the root of a sub-c-structure with y as terminal string such 
that (p, y) £ S(q,x), or less formally, such that the NFT may write y when reading the 
string x processing from state q to p (Figure 3). This is done such that the new c-structure 
gives a specification of how the NFT processes the input string, changes states and writes 
symbols. Downwards in the new c-structure we get more and more details of how the string is 
processed. In the end the grandmothers of the terminal nodes correspond to each transition 
step. 

In the definition, part a), b) and c) give us for any c-structure based on G with w = bi...b n 
as terminal string, the upper part of a new c-structure based on Gm, where the upper part 
is isomorphic with the first c-structure except that it will have an additional root node on 
the top. The main point here is that the terminal nodes in the first c-structure will have 
corresponding nodes with possible categories (go, <7i), (<7i, &2> <72)> •••> (<ln-i,b n , q n ) in the 
new one, for any sequence of states, go, gi, q n where go is the initial state, and q n is a final 
state. This is done such that if a node has (exactly) two daughters labeled (q,Ki,q') and 
(g", K2, g'"), g' must be equal to q" and the mother node must be labeled (g, Ko, q'") where 
Ko, K\ and K3 are the categories labeling the corresponding nodes in the first c-structure. 
Part d) in the definition allows the NFT to write symbols and change states while reading the 
empty string. In part e) we limit the previous parts of the definition such that all c-structures 
must correspond to the transition function in the NFT. This is achieved by requiring that for 
any symbol b £ (Au{e}), the triple category (gi, b, q2) can only label the grandmother nodes 
of the terminal nodes in a c-structure if in fact there exists a one step transition from state 
gi to g2 while reading b. The daughters of this node have nonterminal categories representing 
the output symbols of this one step transition. The last part of the definition f) is just the 
lexical complement of part e). 

With respect to the sets of equation schemata we take these with us to the new c-structure 
such that we get the same constraints on the new c-structures as on the c-structure based on 
the original grammar. 

Lemma 3 C(thfsg) is closed under NFT-mapping. 
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Proof: Given Definition 7 we have to show that for all strings in u G £*, u G L(Gm) if an d 
only if there exist a string -it? in Z/(G) and a a final state q £ F such that (g, m) G S(qo,w). 

(=^) Assume that we have a consistent c-structure based on Gm with m as terminal string 
such that u G L(Gm)- 1) By an induction on the height of the nodes we have from d), e) and 
f) in Definition 7 that if a node with category (q, b, q') is a root of a sub-c-structure with y as 
terminal string, where b G (Au{e}), then (g', y) G S(q, b). 2) By an induction top down in the 
c-structure we have from a), b), c), and d) in Definition 7 that for any horizontal node-cut of 
nodes labeled with triple categories (gi, (3\, q[), (q n , (3 n , q' n ) where fi\, (3 n G (/CU Au{e}), 
that q[ = for all i, 1 < i < (i — 1), gi is the initial state and q' n is a final state. 3) There 
exists a sequence of the topmost nodes with triple categories where each /3; is in (Au{e}) and 
each node has a mother node with a category (g', K{, qi) for K{ G /C. This sequence forms a 
node cut and if (go, b\, gi), (gi, 62, 92), (<7n-i> b n , q n ) are the categories labeling these nodes 
in lexicographical order, this sequence give us a string w = bi...b n in A*. The concatenation 
of the terminal strings t/i,...,t/„of the sub-c-structures where these nodes are the roots is u. 
From the definition of the extended transition function and the induction in the first part 
we have that (q n ,u) G S(qo,w). 4) By reversing Definition 7 b) and c) it is straightforward 
to construct a c-structure for w based on G, and if the c-structure for u generates a feature 
structure so must the one for w. Then w G L(G). 

(<^=) Assume that we have a w G L(G) and a final state q such that (g, u) G #(go, for 
a string m G £*. Since (q,u) G #(go,w) there must be a processing of w of the NFT with 
u as output. Following the discussion of Definition 7 it is straightforward to construct a 
c-structure for u based on Gm which specifies the processing of w in M. If the c-structure for 
w generates a feature structure so must the new one also, since we do not add any substantial 
new equations. Then we have that u G L(Gm)- 1=1 

From Lemma 2 and Lemma 3 we have the main result in this section. 

Theorem 1 C(thfsg) is a full abstract family of languages. 

5 Cross-Serial Dependencies 

During the last ten to fifteen years the discussion whether or not natural languages can be 
described by context-free grammars has been revived [GP82]. This discussion distinguishes 
between a grammars capacity to describe a language strongly, i.e., to describe the language 
as a structured set, or weakly, i.e., to describe the language as a set of strings. Cross-serial 
dependencies are one of the main characteristics used to show that context-free grammars 
are not capable of even weakly to describe natural language. 

Cross-serial dependencies occur in languages like {xx \ x G S*} 6 and {wa m b n xc m d n y \ 
w,x,y,£ S*,m,ra > l,a,b,c,d £ S}, but not in languages like {xx R \ x G S*} 7 where we 
have nested dependencies. 

Shieber [Shi85] has shown that Swiss German has cross-serial dependencies on the syntax 
level, and therefore in a weak description of the language. This is due to two facts about 
Swiss German: 

"First, Swiss German uses case-marking (dative and accusative) on objects, just 
as standard German does; different verbs subcategorize for objects of different 

6 We assume that E has more than one symbol 
7 If x = a\ . . . a n then x R = a n . . . a\ . 
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case. Second, Swiss German, like Dutch, allows cross-serial order for the struc- 
ture of subordinate clauses. Of critical importance is the fact that Swiss German 
requires appropriate case-marking to hold even within the cross-serial construc- 
tion." Shieber [Shi85] (page 334). 

This occurs e.g. in the following subordinate clauses preceded by "Jan sdit das" ("Jan 
says that"): 8 

. . . mer em Hans es huus halfed aastriiche 

...we Hans (DAT) the house(ACC) helped paint (47) 
. . . we helped Hans paint the house. 

Here the verb halfed subcategorizes for an object in dative; em Hans, and the verb aastri- 
iche subcategorizes for an object in accusative; es huus. Shieber shows that this dependency 
is robust and that it holds in quite complex clauses, as seen in this example: 

...mer d'chind em Hans es huus 

...we the children (ACC) Hans (DAT) the house(ACC) 

haend wele laa halfe aastriiche 

have wanted let help paint (48) 

. . . we have wanted to let the children help Hans paint the house. 

If we change the cases of the objects then the strings become ungrammatical. Shieber (p. 
336) specifies 4 claims that this construction in Swiss German satisfies: 

1. "Swiss-German subordinate clauses can have a structure in which all the Vs follow all 
the NPs. " 

2. "Among such sentences, those with all dative NP's preceding all accusative NPs, and all 
dative- subcategorizing Vs preceding all accusative- subcategorizing Vs are acceptable. " 

3. "The number of Vs requiring dative objects (e.g., halfe,) must equal the number of dative 
NPs (e.g., em Hans,) and similarly for accusatives (laa and ch'md)." 

4. "An arbitrary number of Vs can occur in a subordinate clause of this type (subject, of 
course, to performance constraints)." 

Shieber then shows that any language that satisfies these claims cannot be context-free, since 
such languages allow constructions on the form wa' m b n xc' m d n y. Here we study the language 
L which contains strings on the form 

Jan sdit das mer N\ . . . N n es huus haend wele V\ . . . V n aastriiche (49) 

where n > 1 and Ni G {em Hans, es Hans, d'chind} 9 and Vi G {halfe, laa} for all i, 
1 > i > n, and such that Vi =halfe if and only if Ni =em Hans. 

We see that this is a subset of Swiss German with the right case marking and subcate- 
gorizing and that it satisfies Shiebers claims. Hence it cannot be context-free. To make it 
easier to study we use the following homomorphism 10 : 

8 All linguistic data are from Shieber [Shi85]. 

9 For simplicity we define the constructions em Hans, es Hans and d'chind as atomic symbols. 
10 We can do this since our grammar formalism is closed under string homomorphism and inverse string 
homomorphism 
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h(Jan sait das mer) 
h(es huus haend wele) 
h(aastriiche) 
h(s) 



X 

y 



(50) 



s for all s G (N a ii U V aU ) 



where N a u is the set {em Hans, es Hans, d'chind} and V a u is the set {hdlfe, laa}. We 
then have that h(L) is the following language: 



h(L) = {xN 1 ...N n yV 1 ...V n z\ 

n > 1 A 

Vi 1 < i < n [Ni G N all A V, G K« A (Vj- = hdlfe 



(51) 



> = em Hans)]} 

We construct the following THFSG-grammar G = (JC,S,T,,V,£) for the language h(L): 

Let 

S = {em Hans, es Hans, d'chind, hdlfe, laa, x, y, z}, and 
K = {S,VP,V,NP,N,X,Y,Z} 
We have the following production rules in V: 

S ^ X NP Y VP Z 

t=; t=; t=; t=; t=; 

NP -> N NP 

t obj =1 t vcomp =1 



NP 



VP ^ V VP 

t o6j =1 t vcomp =1 



VP 



N 

t o6j =; 

t vcomp = n«// 

t obj =1 
t vcomp = n«// 



We have the following lexicon rules in C: 

N em Hans N — > es Hans 

t case = DAT t case = ACC 



N 



d'chind 
■\ case = ACC 



V 



X 



t obj case = ACC 



V — > hdlfe 

t obj case = DAT 

Y y 



Z 



From this grammar we get that strings like "x em Hans d'chind y hdlfe laa z" are gram- 
matical, while a string like "x es Hans d'chind y hdlfe laa z" is ungrammatical, because of an 
inconsistency in the equation set. In figure 4 we show the c-structure and feature structure 
for the string 



"x d'chind em Hans y laa hdlfe z" 



(52) 



This is not meant as an adequate linguistic analysis, but an example of how we may 
collect cross-serial information with a THFSG-grammar. 



6 Summary and remarks 

We have defined a grammar formalism that describes a full abstract family of languages 
and showed that it can weakly describe a small subset of Swiss German with cross-serial 
dependencies. The method used to show that THFSG describes a full abstract family of 
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s 




Figure 4: c- structure and feature structure for cross-serial dependencies 



languages is of some independent interest. It seems to be applicable to many other unification 
grammar formalisms with a context free phrase structure backbone. The method basically 
requires that the equation sets are more or less uniform in the phrase structure and we 
have the possibility to add "no information" equation sets. Additional constraints on how 
information is collected, shared and distributed in the phrase-structure tree may complicate 
its application. 

There are two potential disadvantages to THFSG. Firstly, its membership problem is NP- 
hard [Col91]. This due to the feature structures capacity to collect and distribute information 
across the sentence. This gives us the possibility to distribute truth-assignments uniformly 
for boolean expressions and then define a grammar that only accepts satisfiable expressions. 

Secondly, does it have enough linguistic flexibility! By this we mean, is it possible in 
THFSG to express linguistic phenomena, as precisely as possible, in the way linguists would 
wish to state them? I THFSG we have a simple way of describing feature structures. As a 
result of this the feature structures will be trees. It may be argued that this is too limited 
compared to the much richer formalisms used in HPSG and LFG. On the other hand, on 
the string level of natural languages, cross-serial dependencies are to my knowledge the only 
constructions that are outside the context-free domain. Therefor the THFSG should be strong 
enough to describe the string sets of natural languages. However, we will not draw any strong 
conclusions regarding the linguistic adequacy of this grammar formalism, but leave it as an 
open question. 
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